58 research outputs found
The proximal point method revisited
In this short survey, I revisit the role of the proximal point method in
large scale optimization. I focus on three recent examples: a proximally guided
subgradient method for weakly convex stochastic approximation, the prox-linear
algorithm for minimizing compositions of convex functions and smooth maps, and
Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New
The many faces of degeneracy in conic optimization
Slater's condition -- existence of a "strictly feasible solution" -- is a
common assumption in conic optimization. Without strict feasibility,
first-order optimality conditions may be meaningless, the dual problem may
yield little information about the primal, and small changes in the data may
render the problem infeasible. Hence, failure of strict feasibility can
negatively impact off-the-shelf numerical methods, such as primal-dual interior
point methods, in particular. New optimization modelling techniques and convex
relaxations for hard nonconvex problems have shown that the loss of strict
feasibility is a more pronounced phenomenon than has previously been realized.
In this text, we describe various reasons for the loss of strict feasibility,
whether due to poor modelling choices or (more interestingly) rich underlying
structure, and discuss ways to cope with it and, in many pronounced cases, how
to use it as an advantage. In large part, we emphasize the facial reduction
preprocessing technique due to its mathematical elegance, geometric
transparency, and computational potential.Comment: 99 pages, 5 figures, 2 table
Complexity of a Single Face in an Arrangement of s-Intersecting Curves
Consider a face F in an arrangement of n Jordan curves in the plane, no two
of which intersect more than s times. We prove that the combinatorial
complexity of F is O(\lambda_s(n)), O(\lambda_{s+1}(n)), and
O(\lambda_{s+2}(n)), when the curves are bi-infinite, semi-infinite, or
bounded, respectively; \lambda_k(n) is the maximum length of a
Davenport-Schinzel sequence of order k on an alphabet of n symbols.
Our bounds asymptotically match the known worst-case lower bounds. Our proof
settles the still apparently open case of semi-infinite curves. Moreover, it
treats the three cases in a fairly uniform fashion.Comment: 9 pages, 5 figure
Stochastic model-based minimization of weakly convex functions
We consider a family of algorithms that successively sample and minimize
simple stochastic models of the objective function. We show that under
reasonable conditions on approximation quality and regularity of the models,
any such algorithm drives a natural stationarity measure to zero at the rate
. As a consequence, we obtain the first complexity guarantees for
the stochastic proximal point, proximal subgradient, and regularized
Gauss-Newton methods for minimizing compositions of convex functions with
smooth maps. The guiding principle, underlying the complexity guarantees, is
that all algorithms under consideration can be interpreted as approximate
descent methods on an implicit smoothing of the problem, given by the Moreau
envelope. Specializing to classical circumstances, we obtain the long-sought
convergence rate of the stochastic projected gradient method, without batching,
for minimizing a smooth function on a closed convex set.Comment: 33 pages, 4 figure
Efficiency of minimizing compositions of convex functions and smooth maps
We consider global efficiency of algorithms for minimizing a sum of a convex
function and a composition of a Lipschitz convex function with a smooth map.
The basic algorithm we rely on is the prox-linear method, which in each
iteration solves a regularized subproblem formed by linearizing the smooth map.
When the subproblems are solved exactly, the method has efficiency
, akin to gradient descent for smooth
minimization. We show that when the subproblems can only be solved by
first-order methods, a simple combination of smoothing, the prox-linear method,
and a fast-gradient scheme yields an algorithm with complexity
. The technique readily extends to
minimizing an average of composite functions, with complexity
in
expectation. We round off the paper with an inertial prox-linear method that
automatically accelerates in presence of convexity
Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
We investigate the stochastic optimization problem of minimizing population
risk, where the loss defining the risk is assumed to be weakly convex.
Compositions of Lipschitz convex functions with smooth maps are the primary
examples of such losses. We analyze the estimation quality of such nonsmooth
and nonconvex problems by their sample average approximations. Our main results
establish dimension-dependent rates on subgradient estimation in full
generality and dimension-independent rates when the loss is a generalized
linear model. As an application of the developed techniques, we analyze the
nonsmooth landscape of a robust nonlinear regression problem.Comment: 36 page
Complexity of finding near-stationary points of convex functions stochastically
In a recent paper, we showed that the stochastic subgradient method applied
to a weakly convex problem, drives the gradient of the Moreau envelope to zero
at the rate . In this supplementary note, we present a stochastic
subgradient method for minimizing a convex function, with the improved rate
.Comment: 9 page
Error bounds, quadratic growth, and linear convergence of proximal methods
The proximal gradient algorithm for minimizing the sum of a smooth and a
nonsmooth convex function often converges linearly even without strong
convexity. One common reason is that a multiple of the step length at each
iteration may linearly bound the "error" -- the distance to the solution set.
We explain the observed linear convergence intuitively by proving the
equivalence of such an error bound to a natural quadratic growth condition. Our
approach generalizes to linear convergence analysis for proximal methods (of
Gauss-Newton type) for minimizing compositions of nonsmooth functions with
smooth mappings. We observe incidentally that short step-lengths in the
algorithm indicate near-stationarity, suggesting a reliable termination
criterion.Comment: 35 page
Semi-algebraic functions have small subdifferentials
We prove that the subdifferential of any semi-algebraic extended-real-valued
function on has -dimensional graph. We discuss consequences for
generic semi-algebraic optimization problems.Comment: 21 pages, 1 figure, Accepted for publication in Mathematical
Programming, Ser.
The nonsmooth landscape of phase retrieval
We consider a popular nonsmooth formulation of the real phase retrieval
problem. We show that under standard statistical assumptions, a simple
subgradient method converges linearly when initialized within a constant
relative distance of an optimal solution. Seeking to understand the
distribution of the stationary points of the problem, we complete the paper by
proving that as the number of Gaussian measurements increases, the stationary
points converge to a codimension two set, at a controlled rate. Experiments on
image recovery problems illustrate the developed algorithm and theory.Comment: 42 Pages, 15 figure
- β¦